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Abstract. We develop a novel framework for formulating a class of stochastic reachability 
problems with state constraints as a stochastic optimal control problem. Previous approaches 
to solving these problems are either confined to the deterministic setting or address almost- 
sure stochastic notions. In contrast, we propose a new methodology to tackle probabilistic 
specifications that are less specific than almost sure requirements. To this end, we first 
establish a connection between two stochastic reach-avoid problems and classes of different 
stochastic optimal control problems for diffusions with discontinuous payoff functions. In the 
sequel, we shall focus on one of the class of stochastic optimal control problem, exit-time 
problem, which indeed addresses both reachability type questions. We derive a weak version 
of dynamic programming principle (DPP) for the value functions. Moreover, based on our 
DPP, we give an alternative characterization of the value function as the solution to a partial 
differential equation in the sense of discontinuous viscosity solutions along with Dirichlet type 
boundary conditions. Finally we validate the performance of the proposed framework on the 
stochastic Zermelo navigation problem. 



1. Introduction 

Reachability is a fundamental concept in the study of dynamical systems, and in view of 
applications of this concept ranging from engineering, manufacturing, biology, and economics, 
to name but a few, has been studied extensively in the control theory literature. One particular 
problem that has turned out to be of fundamental importance in engineering is the so-called 
"reach-avoid" problem. In the deterministic setting this problem consists of determining the set 
of initial conditions for which one can find at least one control strategy to steer the system to 
a target set while avoiding certain obstacles. The set representing the solution to this problem 
is known as capture basin [Aub91]. This problem finds applications in, air traffic management 
[LTSOO], security of power networks [EVM+10]. A direct approach to compute the capture basin 
is formulated in the language of viability theory in [Car96, CQSP02]. Related problems involving 
pursuit-evasion games are solved in, e.g., [ALQ+02, GLQ06] employing tools from non-smooth 
analysis, for which computational tools are provided by [CQSP02]. 

An alternative and indirect approach to reachability involves using level set methods de- 
fined by value functions that characterize appropriate optimal control problems. Employing 
dynamic programming techniques for reachability and viability problems in the absence of state- 
constraints, these value functions can in turn be characterized by solutions to the standard 
Hamilton-Jacobi-Bcllman (HJB) equations corresponding to these optimal control problems 
[Lyg04]. Numerical algorithms based on level set methods were developed by [OS88, Sct99] 
and have been coded in efficient computational tools by [MT02, Mit05]. Extending the scope of 
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this technique, the authors of [FG99, BFZ10, ML 11] treat the case of time- independent state con- 
straints and characterize the capture basin by means of a control problem whose value function 
is continuous; for the related problems in the hybrid setting see [LTS99, PH07]. 

In the stochastic setting, different probabilistic analogs of reachability problems have been 
studied extensively: Almost-sure stochastic viability and controlled invariance are treated in 
[AD90, Aub91, APFOO, BJ02]; see also the references therein; for the related problems in the 
hybrid setting see [BL07, BB09, BujlO]. Methods involving stochastic contingent sets [AP98, 
APFOO], viscosity solutions of second-order partial differential equations [BPQR98, BG99, BJ02], 
and derivatives of the distance function [DF01] were developed in this context. In [DF04] the 
authors developed an equivalence for the invariance problem between a stochastic differential 
equation and a certain deterministic control system. Following the same problem, the authors 
of [ST02] studied the differential properties of the reachable set based on the geometrical partial 
differential equation which is the analogue of the HJB equation for this problem. Recently, 
following the same approach, the obstacle version of this Geometric Dynamic Programming 
Principle has been addressed in [BV10]. 

Although almost sure versions of reachability specifications are interesting in their own right, 
they may be too strict a concept in some applications. For example, in the safety assessment 
context, a common specification involves bounding the probability that undesirable events take 
place. Motivated by this, in this article we develop a new framework for solving the following 
stochastic reach-avoid problem: 

RA: Given an initial state x € W 1 , a horizon T > 0, a number p £ [0, 1], and 
two disjoint sets A,Bc K™, determine whether there exists a policy such that 
the controlled processes reaches A prior to entering B within the interval [0, T] 
with probability at least p. 

Observe that this is a significantly different problem compared to its almost-sure counterpart 
referred to above. It is of course immediate that the solution to the above problem is trivial if 
the initial state is either in B (in which case it is almost surely impossible) or in A (in which 
case there is nothing to do). However, for generic initial conditions in R" \ (AU B), due to the 
inherent probabilistic nature of the dynamics, the problem of selecting a policy and determining 
the probability with which the controlled process reaches the set A prior to hitting B is non- 
trivial. In addition, we address the following slightly different reach-avoid problem compared to 
RA above, that requires that the process be in the set A at time T: 

RA: Given an initial state x £ W™, a horizon T > 0, a number p £ [0, 1], and 
two disjoint sets A,Bc K™, determine whether there exists a policy such that 
with probability at least p the controlled processes resides in A at time T while 
avoiding B on the interval [0, T]. 

In §2 we formally introduce the stochastic reach-avoid problem RA above. In §3 we charac- 
terize the set of initial conditions that solve the problem RA above in terms of level sets of three 
different value functions. This provides a connection between this stochastic reach-avoid prob- 
lem and three different classes of stochastic optimal control problems. An identical connection 
is also established for a solution to the related reach-avoid problem RA above. One of the three 
stochastic optimal control problems alluded to above concerns the standard exit-time problem 
[FS06, p. 6]. In this light, in §4 we focus on the value function corresponding to the exit-time 
problem, establish a dynamic programming principle (DPP) for it, and characterize it as the 
(discontinuous) viscosity solution of a partial differential equation along with pointwise bound- 
ary conditions, so called Dirichlet boundary condition. §5 presents results connecting those in 
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§3 and §4, and provides a solution to the stochastic reach-avoid problem in an "e-conservative" 
sense. One may observe that this e-precision can be made arbitrarily small. To illustrate the 
performance of the our techniques, the theoretical results developed in preceding sections are 
applied to solve the stochastic Zermelo navigation problem in §6. 

Notation 

For the ease of readers, we provide here a partial notation list which will be also explained in 
more details later in the article: 

• A (rcsp. V): minimum (resp. maximum) operator; 

• A (resp. A°): closure (resp. interior) of the set A; 

• B r (x): open Euclidian ball centered at x and radius r; 

• C r (t,x): a cylinder with height and radius r, see (20); 

• U T : set of F r -progressively measurable maps into HJ; 

• T[ Tl . T2 ] ■ the collection of all F Tl -stopping times r satisfying t% < r < r 2 P-a.s. 

• (Xl' x ' u ) s > : stochastic process under the control policy u and assumption Xl' x ' u := x 
for all s < t; 

• ta- first entry time to A, see Definition 2.2; 

• V* (resp. V*): upper semicontinuous (resp. lower semicontinuous) envelope of function 
V; 

• USC(§) (resp. LSC(S)): collection of all upper semicontinuous (resp. lower semicontin- 
uous) functions from S to R; 

• C u : Dynkin operator, see Definition 4.9. 

2. Problem Statement 

Consider a filtered probability space (il, J 7 , F,P) whose filtration F = (J r s ) s > is generated 
by an n-dimensional Brownian motion (W s ) s >o adapted to F. Let the natural filtration of the 
Brownian motion (W s ) s >o be enlarged by its right-continuous completion; — the usual conditions 
of completeness and right continuity, where (W s ) s >o is a Brownian motion with respect to F 
[KS91, p. 48]. 

For every t > 0, we introduce an auxiliary subfiltration Ft := (.Ft s) g >Q, where Tt, a is the 
P-completion of a(W r — W tl t < r < t V s). Note that for s < t, J- t , s is the trivial a— algebra, 
and any .F tjS -random variable is independent of J- t . By definitions, it is obvious that J- t . s C T s 
with equality in case of t = 0. 

Let U C K m be a control set, and U t denote the set of F t -progressively measurable maps 
into U. 1 We employ the shorthand U instead of Wo for the set of all F-progressivcly measurable 
policies. We also denote by T the collection of all F-stopping times. For T\,T2 € T with t\ < r 2 
P-a.s. the subset 7[ Tl! r 2 ] is the collection of all F ri -stopping times r such that t\ < t < r 2 P-a.s. 
Note that all F T -stopping times and F T -progressively measurable processes are independent of 
T T . 

The basic object of our study concerns the M™-valued stochastic differential equation (SDE) 
(1) dX s = f(X s ,u s )ds + a(X s ,u s )dW s , X = x, s > 0, 

^Recall [KS91, p. 4] that a U-valued process (y s ) s >0 i s Ft -progressively measurable if for each T > the 
function S7 X [0, T] 9 (a), s) h-> y(u, s) g U is measurable, where Q X [0, T] is equipped with T ti T ® !8([0, T]), U is 
equipped with 0S(U), and 23(5) denotes the Borel cr-algebra on a topological space S. 
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where / : 1" x U — > R n and a : R™ x U — > R nxn are measurable maps, (W s ) s > is the above 
standard rt-dimensional Brownian motion, and u := (u s ) s >q G 

Assumption 2.1. W^e stipulate that 

a. V C M m is compact; 

b. f is continuous and Lipschitz in its first argument uniformly with respect to the second; 

c. a is continuous and Lipschitz in its first argument uniformly with respect to the second. 

It is known [Bor05, YZ99] that under Assumption 2.1 there exists a unique strong solution to 
the SDE (1). By definition of the filtration F, we see that the control functions u E U satisfy 
the non-anticipativity condition [Bor05] — to wit, the increment Wt — W s is independent of the 
past history {W r ,u r | r < s} of the Brownian motion and the control for every s E [0,t[. (In 
other words, u does not anticipate the future increment of W). We let (Xl ,x ' ,u ) s > t denote the 
unique strong solution of (1) starting from time t at the state x under the control policy u. 
For future notational simplicity, we slightly modify the definition of Xl' X]U , and extend it to the 
whole interval [0, T] where Xl' x,u := x for all s in [0,t]. Measurability on R n will always refer 
to Borel- measurability. In the sequel the complement of a set S C R™ is denoted by S c . 

Definition 2.2 (First entry time). Given a control u, the process {Xl' x ' ,u ) s > t , and a measurable 
set A C W 1 , we introduce 2 the first entry time to A: 

(2) r A (t, x) = inf{.s > t | .\ : r: " G A}. 

In view of [EK86, Theorem 1.6, Chapter 2], r^i(t,x) is an F t -stopping time. 

Remark 2.3. By Definition 2.2 and P-a.s. continuity of sample paths, it can be easily deduced 
that given u EU: 

(3a) t AuB = t a At Bi 

(3b) X\' x ' u G A t a < s, 

(3c) A is closed X l T f u G A. 
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FIGURE 1. The trajectory X^ hits A prior to B within time [0,T], while X^ 
and X^ do not; all three start from initial state xq. 

Given an initial condition (t, x), we define the set RA(i,p; A, B) as the set of all initial con- 
ditions such that there exists an admissible control strategy it G U such that with probability 
more than p the state trajectory Xl x ' ,u hits the set A before set B within the time horizon T. 



By convention, inf = oo. 
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Definition 2.4 (Reach- Avoid within the interval [0, T]). 
RA(t, p; A, B) := jz G R" | 3u G U : 

Ff x (3s G [t, T], Xt' x ' u G A and Vr G [i, s] A^ ;u ^ s) > p}. 

We have suppressed the initial condition in the above probabilities, and will continue doing 
so in the sequel. A pictorial representation of our problems is in Figure 1. Our main objective 
in this article is to propose a framework in order to compute RA numerically. 

3. Connection to Stochastic Optimal Control Problem 

In this section we establish a connection between the stochastic reach-avoid problem RA and 
three different classes of stochastic optimal control problems. One can think of several different 
ways of characterizing probabilistic reach avoid sets, see e.g. [CCL11] and the references therein 
dealing with discrete-time problems. Motivated by these works, we consider value functions 
involving expectation of indicator functions of certain sets. Three alternative characterizations 
are considered and we show all three are equivalent. Consider the value functions Vi : [0, T] x 
R n -> [0, 1] for i = 1,2, 3, defined as follows: 

(4a) Vi(t,x) := sup E[tA(Xp x;u )] where f := t AuB A T, 

ueu 

(4b) V 2 (t,x) := sup IE 

ueu 



sup |i: ,;.V: ' : "i • inf 1 S .(A^ ; ™)} 

.a£[t,T] 1 r£[t,s] > 



(4c) V 3 (t,x):=sup sup inf E[l A (A*^ u ) A l B c(A*^ u )] . 

Here taub is the hitting time introduced in Definition 2.2, and depends on the initial condition 
(t, x). Also note that for a measurable function / : W n — > K hereinafter E[/(A^' :E ' 11 )] stands for 
conditional expectation with initial condition (t, x) given and under the control u. For notational 
simplicity, we drop the initial condition in this section. 

The first result of this section, Proposition 3.2 asserts that E[l^(A^' a: ' 11 )] = P" 2 .(r J 4 < 
T Bi T A < Since ta and Tg are E-stopping times, it then indicates mapping (t, x) i— > 

Wi[lA(Xf X ' u )~\ is well-defined. Furthermore, in Proposition 3.3 we shall establish equality of the 
three functions Vi, V2, V3 that will prove the other value functions are also well-defined. 

Assumption 3.1. We assume that the sets A and B are disjoint and closed. 

Proposition 3.2. Consider the system (1), and let A,Bc R n be given. Under Assumptions 
2.1 and 3.1 we have 

RA(t,p;A,B) = {xe R™ | Vi(t,x) > p}, 
where the set RA is the set defined in Definition 2.4 and V\ is the value function defined in (4a). 

Proof. In view of Assumption 3.1, the implication (3b), and the definition of reach-avoid set in 
2.4, we can express the set KA(t,p; A, B) as 

(5) RA(t,p-A,B) = ja: G R" | 3u G U : P" x (r^ < t b and t a < T) >p\. 

Also, by Assumption 3.1, the properties (3a) and (3c), and the definition of stopping time f in 
(4a), given u G U we have 

Xp x,u g A =>■ t a < f and f 7^ tb ==>■ T > f — t a < tb, 
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which means the sample path x\ ,x ' u hits the set A before B at the time f < T. Moreover, 

x t,*; U ^ A ==> f ^ Ta => T = (tq A T) < t a , 

and this means that the sample path does not succeed in reaching A while avoiding set B within 
time T. Therefore, the event {ta < tb and ta < T} is equivalent to {X f ' x ' u G A}, and 

KJt a < t b and r A < T) = E[l A (Xp x;u )]. 

This, in view of (5) and arbitrariness of control strategy u GU leads to the assertion. □ 

Proposition 3.3. Consider the system (1), and let A, B C R n be given. Under Assumptions 
2.1 and 3.1 we have 

V 1 = V 2 = V 3 on [0,T]xM", 
where the value functions Vi, V2, V3 are as defined in (4). 

Proof. We first establish the equality of V\ = Vi- To this end, let us fix it € U and (i, x] in 
[0, T] x W 1 . Observe that it suffices to show that pointwise on il, 

l A (Xp x ' u ) = sup {t A (Xt^ u )A inf l B o(Xp x ' u )}. 

se[t,T] re[t,s] 

According to the Assumption 3.1 and Remark 2.3, one can simply see that 
sup {l A {Xi' x ' u ) A inf l B c(X t r ' x ' u )} = l 

s£[t,T] r£[t,s] 

<S=^ 3s G [t, T] X l f'- U G A and Vr G [t, s] G B c 

<^> 3s e [t,T] T A < s <T and t b > s 

. . yt,X\U yt,x;u _ yt,x:u a 

: l,(.Vi' : ") = 1 

and since the functions take values in {0, 1}, we have Vi(t, x) — V^t, x). 

As a first step towards proving V\ — V3, we start with establishing V3 > Vi. It is straightfor- 
ward from the definition that 

(6) sup inf E[l A (lf)Al Bc (lf ; ")]> inf E[l A (4 I;tt )M fl e(I^)], 

where f is the stopping time defined in (4a). For all stopping times a G 7[t.r], in view of (3b) 
we have 

\B^X l f^) = =>■ A^< u eB^r B <a<f = r A AT B Ar 
=^r B = a = f<T A =^ £ A 

=^ t A (X t f ' x ' u )=0 
This implies that for all a G T^r] > 

A lBe (jc*,Bi«) = t A (xl' x -' u ) P-a.s. 
which, in connection with (6) leads to 

sup inf E[1 A (A^") A l B c(Xi^)] > E[l y4 (X^ ;11 )] . 

By arbitrariness of the control strategy u £ U, we get V3 > V\. It remains to show V2 < Vf. 
Given u GlA and r G Tf^T] , let us choose a :— r A tb- Note that since t < a < t then er G 7[t, T ] ■ 
Hence, 

(7) inf E[l A (X*'«' tt ) A l B c(A^ u )] < E[l4X^ u ) A lfl.p^™)]. 



ON STOCHASTIC REACH-AVOID PROBLEM AND SET CHARACTERIZATION FOR DIFFUSIONS 7 

Note that by an argument similar to the proof of Proposition 3.2, for all r € T[t,T\' 
tA (X t T > x > u ) A l B a(Xp x ' u ) = 1 X l f- U e A and X^ u $ B 

=>■ t a < t < T and o ^ t b 

t~a < t < T and < a = r < t b 
=>t = t a At b AT = t a => l A (Xp x ' u ) = 1. 

It follows that for all t € 7jt )T ], 

1 A (A^™) A l B c{X^ u ) < t A (X^ u ) P-a.s. 
which in connection with (7) leads to 

sup inf E[1^X^' U ) A 1 B «(X^ ; ")] < E[^(l^ iU )] . 

By arbitrariness of the control strategy u £ U we arrive at V3 < V±. □ 

We introduce the reach-avoid problem RA mentioned in §1. The reach-avoid problem in 
Definition 2.4 poses a reach objective while avoiding barriers within the interval [t,T]. A similar 
problem may be formulated as being in the target set at time T while avoiding barriers over the 
period [t,T]. Namely, we define the set RA(t,p;A, B) as the set of all initial conditions such 
that there exists an admissible control strategy u € U such that with probability more than p, 
Xj, x,u belongs to A and the process avoids the set B over the interval [t, T]. 

Definition 3.4 (Reach- Avoid at the terminal time T). 
RA(i,p; A, B) := ja; € R n | 3u € U : 

F?Jx% x ' u e A andVr € [t,T] X f r - X:u £ b) > p\. 

One can establish a connection between the new reach-avoid problem in Definition 3.4 and 
different classes of stochastic optimal control problems along lines similar to Propositions 3.2 
and 3.3. To this end, let us define the value functions Vi : [0, T] x M n —> [0, 1] for i = 1,2, 3, as 
follows: 

(8a) Vi(t,x) :— supWi[t A (X~ x ' u )] where r := t b A T, 

(8b) V 2 {t, x) :— sup E 

(8c) V 3 (t,x):=sup inf E[l A (AA^ ;u ) A \ B .{X*^)] . 

In our subsequent work, measurability of the functions Vi and Vi turn out to be irrelevant; 
see Remark 4.8 for details. We state the following proposition concerning assertions identical to 
those of Propositions 3.2 and 3.3 for the reach-avoid problem of Definition 3.4. 

Proposition 3.5. Consider the system (1), and let A,B C R n be given. If the set B is closed, 
then under Assumption 2.1 we have RA(t,p; A, B) — {x £ K™ | V\{t,x) > p}, where the set RA 
is the set defined in Definition 3.4- Moreover, we have Vi = V% = V3 on [0,T] x W 1 where the 
value functions V\ , Vi, V3 are as defined in (8) . 



lU(A^f ;u )A inf 1 B .(^ U ) , 



Proof. The proof follows effectively the same arguments as in the proofs of Propositions 3.2 and 
3.3. □ 
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4. Alternative Characterization of Exit-Time Problem 

The stochastic control problems introduced in (4a) and (8a) are well-known as the exit-time 
problem [FS06, p. 6]. Note that according to Propositions 3.2 and 3.5, both the problems in 
Definitions 2.4 and 3.4 can alternatively be characterized in the framework of exit-time problems, 
see (4a) and (8a), respectively. Motivated by this, in this section we present an alternative char- 
acterization of the exit-time problem based on solutions to certain partial differential equations. 
To this end, we generalize the value functions to 

(9) V(t,x):= BupE[*(X*>«;")], f(t,x):=ro(t,x)AT, 
with 

(10) £ : E" -> E 

a given bounded measurable function, and O a measurable set. Note that to is the stopping 
time defined in Definition 2.2 that in case of value function (4a) can be considered as O = AUB. 
Note once again that measurability of the function V is irrelevant to our work; see Remark 4.8 
for details. 

Hereafter we shall restrict our control processes to Ut, the set Ut denotes the collection of all 
F t -progressively measurable processes u £ U. We will show that the function V in (9) is well- 
defined, Fact 4.2. In view of independence of the increments of Brownian motion, the restriction 
of control processes to Ut is not restrictive, and one can show that the value function in (9) 
remains the same if U t is replaced by U; see, for instance, [Kry09, Theorem 3.1.7, p. 132] and 
[BT11, Remark 5.2]. 

Our objective is to characterize the value function (9) as a (discontinuous) viscosity solution 
of a suitable Hamilton- Jacobi-Bellman equation. We introduce the set § := [0, T] x E n and 
define the lower and upper semicontinuous envelopes of function V : § — > E: 

V.,{t,x):= liminf V(t',x') V*[t,x):= limsup V{t',x') 

and also denote by USC(S) and LSC(S) the collection of all upper-semicontinuous and lower- 
semicontinuous functions from S to E respectively. Note that, by definition, V* e LSC(S) and 
V* G USC(S). 

4.1. Assumptions and Preliminaries. 

Assumption 4.1. In addition to Assumption 2.1, we stipulate the following: 

a. (Non-degeneracy) The controlled processes are uniformly non- degenerate, i.e., there exists 
S > such that for all x £ E" and u G U, ||o"er T || > 5 where a{x,u) is the diffusion term in 
SDE (1). 

b. (Interior Cone Condition) There are positive constants h, r an E™ -value bounded map r\ : 
O —> E™ satisfying 

B rt (x + r/(x)t) C O for all x eO and t G (0, h] 

where H r (x) denotes an open ball centered at x and radius r and O stands for the closure of 
the set O. 

c. (Lower Semicontinuity) The function £ defined in (10) is lower semicontinuous. 

Note that if the set A in §3 is open, then £(■) — Ia(-) satisfies Assumption 4. I.e. The interior 
cone condition in Assumption 4.1.b. concerns shapes of the set O; figure 2 illustrates two typical 
scenarios. 
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(a) Interior cone condition holds at every (b) Interior cone condition fails at 

point of the boundary. the point p — the only possible in- 

terior cone at p is a line. 



Figure 2. Interior cone condition holds at every point of the boundary. 

Fact 4.2 (Measurability) . Consider the system (1), and suppose that Assumption 2.1 holds. Fix 
(t,x,u) eSxii and take an F- stopping time 8 : ft — >• [0, T]. For every measurable function 
f : R™ -> R, the function 

ftac^sH :=/(lg»)el 

is J- -measurable (Recall that (X t s ,x:u ^ is the unique strong solution of (1)). 



Let us define the function J : § xU 
(11) j(t,x,u) := E 



V r{t 1 x t 



r(t, x) :— To(t, x) A T. 



In the following proposition, we establish continuity of r(t,x) and lower semicontinuity of 
J(t,x,u) with respect to (t,x). 

Proposition 4.3. Consider the system (1), and suppose that Assumptions 2.1 and 4-1 hold. 
Then for any strategy u G U and (in, Xq) G S, P-a.s. the function (t, x) i— > f(t, x) is continuous 
at (in, so)- Moreover, the function (t,x) i— > J(t,x,uj defined in (11) is uniformly bounded and 
lower semicontinuous: 

j(t,x,u) < liminf jit'x'u). 

V ' ~~ (t',x')->(t,x) K ' 



Proof. We first prove continuity of r(i, x) with respect to (t, x). Let us take a sequence (t n , cc„) — > 
(io, x ), and let (X* n ' Xn ' u ) r>t be the solution of (1) for a given policy u £U. Let us recall that 



by definition we assume that X s 



for all s £ [0,i\. Here we assume that t n <t, but one 



can effectively follow the same technique for t n > t. Notice that it is straightforward to observe 
that by the definition of stochastic integral in (1) we have 



a(X t "' x '-' u ,u s )dW a P-a.s. 



Therefore, by virtue of [Kry09, Theorem 2.5.9, p. 83], for all q > 1 we have 



E 



sup 1 1 X r 

r£[t,T] 



X 



t n ,x n ;ix 1 1 2<? 



< Ci(« s T,A-)E 



t„,a;„;u||2g 



< 2 29 - 1 C*i(g,T,K)E 



1 2 9 



t„,2i„;u||2fj 



where in light of [Kry09, Corollary 2.5.12, p. 86], it leads to 



(12) 



E 



sup ||X* 

r£[t,T] 



X 



t n ,x n :u 1 1 



< 



C 2 (q,T,K,\\x\\)(\\x-x n \\ 2 i + \t-t n \i). 
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In the above relations K is the Lipschitz constant of / and a mentioned in Assumption 2.1; C\ 
and C2 are constant depending on the indicated parameters. Hence, in view of Kolmogorov's 
Continuity Criterion [Pro05, Corollary 1 Chap. IV, p. 220], one may consider a version of the 
stochastic process X.' x ' which is continuous in (t, x) in the topology of uniform convergence on 
compacts. This yields to the fact that P-a.s, for any e > 0, for all sufficiently large n, 

(13) xl MU e B £ (X^ o;u ), Vre[t„,T], 

where B e (?/) denotes the ball centered at y and radius e. Based on the Assumptions 4.1. a. and 
4.1.b., it is a well-known property of non-degenerate processes that the set of sample paths that 
hit the boundary of O and do not enter the set is negligible [RB98, Corollary 3.2, p. 65]. Hence, 
by the definition of f and (3b), one can conclude that 

V8>0, 3e>0, |J B e (X*°' a: ° i,t )nO = 

se[t ,f(t ,x )-s] 

This together with (13) indicates that P-a.s. for all sufficiently large n, 



P-a.s. 



X'. 



<£0, Vr e [t ni T(t ,x )[ , 



which in conjunction with P-a.s. continuity of sample paths immediately leads to 
(14) liminf f(t„, x n ) > f(to, x ) P-a.s. 

(t„,x„)->(t,a:) 

On the other hand by the definition of f and Assumptions 4.1. a. and 4.1.b., again in view of 
[RB98, Corollary 3.2, p. 65], 

V<5>0, 3se[To{to,x ),r (t ,x )+5[, X l s ^ u e 0° P-a.s., 

where To is the first entry time to O, and 0° denotes the interior of the set O. Hence, in light of 
(13), P-a.s. there exists e > 0, possibly depending on S, such that for all sufficiently large n we 
have Xl n ' Xn ' u g B e (Xl' x ' ,u ) C O. According to the definition of To(t n ,x n ) and (3b), this implies 
To(t n ,x n ) < s < To{to,xo) + S. From arbitrariness of S and the definition of f in (11), it leads 
to 

limsup f(t n ,x n ) < f(t ,x ) P-a.s., 

(t n ,x n )^r(t,x) 

where in conjunction with (14), P-a.s. continuity of the map (t,x) 1— ¥ f(t,x) at (£ ,xo) follows. 

It remains to show lower semicontinuity of J. Note that J is bounded since £ is. In accordance 
with the P-a.s. continuity of Xp X]U and f(t,x) with respect to (t,x), and Fatou's lemma, we 
have 

t n ^Xn \U ' 



lim inf J(i„ , x n , u) = lim inf IE i(X- n ,l Xrl,u -,\ 



(15) 



= lim inf IE 

n— >oo 

= lim inf E 

n— rOO 



t \ J ^r(t n ,x n ) ~ ^f(t n ,x n ) ^f(t n ,x n ) _ ^f(t,x) + ^f{t,x)J 



X 



t,x;u \ 
f(t,x)> 



> E 



lim inf i 

. n— foo 



■XI 



(t,x)) 



> E 



where inequality in (15) follows from Fatou's Lemma, and e„ — > P-a.s. as n tends to 00. Note 
that by definition X^ t ' u x > = x on the set {f (t ni x n ) < t}. □ 

Remark 4.4. As a consequence of Fact 4-2 and Proposition 4- 3, one can observe that for fixed 
(t, x, u) G S x U the function 

rt,X\U , 



fi9«^J(%),^(«) ) tt) e 



T- 



measura 



ble. 
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Fact 4.5 (Stability under Concatenation). For every u and v inUt, and 9 G T[t,T] 

t[t t e]U + l]o,T]V G U t . 

Proposition 4.6 (Strong Markov Property). Consider the system (1) satisfying Assumptions 
2.1. Then, for a stopping time 9 G T[t,T\ an d an admissible control u = l^gjiii + l]e.T] M 2t where 
«i,«2 £ Mt, we have 



E 



>«2 



P-a. 



Proof. By Definition 2.2, one has 

^-{e<f(t,x)}T(t, x) = l{e<f (t,x)} (t{9, xl'^ 1 ) + 9 - t) 



P-a.s. 



One can now follow effectively the same computations as in the proof of [BT11, Proposition 5.1] 
to arrive at the assertion. □ 

4.2. Dynamic Programming Principle. The following Theorem provides a dynamic pro- 
gramming principle (DPP) for the exit time problem introduced in (9). 

Theorem 4.7 (Exit Time Problem DPP). Consider the system (1), and suppose that Assump- 
tions 2.1 and 4-1 hold. Then for every (t, x) G S and for all stopping times 9 G 7[t,T]> 



(16) 

and 
(17) 



V(t,x) < sup E 
ueu t 



t,x;u \ 
(i,x)J 



1 



{f{t,x)>8} 



i X a 



V(t,x) > sup E t { f(t, x) <6A xt rftx)) +^{Ht,x)>8}V*{9,X t e x -' u ) 



where V is the value function defined in (9) . 

Proof. The proof is based on techniques developed in [BT11]. We assemble an appropriate 
covering for the set S, and use this covering to construct a control strategy which satisfies the 
required conditions within e precision, e > being pre-assigned and arbitrary. 



Proof of (16). Note once again that in view of [Kry09, Theorem 3.1.7, p. 132] and [BT11, 
Remark 5.2], V(t,x) — sup^g^ J(t, x, it) where value function V is defined as (9). Therefore, for 
any v G U t and (t, x) G 8 

V* (9, XY X ' U ) > V(6, Xl' x ' u ) > J(6, A^ x;u , v) P-a.s. 

According to Proposition 4.6 and using the tower property of conditional expectation [Kal97, 
Theorem 5.1], it follows that 



E 



= E 
= E 
< E 



E 



^{f(t,x)<6}t{Xff t '£-)) + ^{f(t,x)>8}J(9, Xg X,U ,u) 

1 {f(t,x)<e}^(X t f f t '^ ) ) + l {f (t .x)>0}V* (9, Xl' x ' u ) 



where taking supremum over all admissible controls u G Ut leads to the dynamic programming 
inequality (16). 
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Proof of (17). Suppose <f> : S — > K is uniformly bounded such that 
(18) <j) £ USC(§) and <j)<V* on §. 

According to (18) and Fact 4.3, given e > 0, for all (to, xq) G S and u £ U to there exists r c > 
such that 

, . 4>{t,x) - e < (f>(t ,x ) < V*(t ,xo), V(f,i)eC ri ((|),io)nS, 

1 ' J (t , x , u) < J(t, x, u) + e, V(i, x) £ C re (i , x ) n S, 

where C r (t,x) is a cylinder defined as: 

(20) C r (t, as) := {(«, y) e R x M" | s e]t - r,t] , \\x - y\\ < r}. 

Moreover, by definition of (11) and (9), given e > and (to,Xo) £ § there exists u\°> x ° £ Ut Q 
such that 

Vi(to,so) < V(t ,x ) < J (to, x , u^ ) + e. 
By the above inequality and (19), one can conclude that given e > 0, for all (to,Xo) £ § there 
exist u\°> x ° £ Ut an d r e r e(toi x o) > such that 

(21) 4>(t,x) -3e < J(t,x,ul°> x °) V((,i)eC r ,(i B ,3;o)n§. 

Therefore, given e > 0, the family of cylinders {C re (t,x) : (t,x) £ S, r € (t 0} xo) > 0} forms an 
open covering of [0,T[xR n . By the Lindeldf covering Theorem [Dug66, Theorem 6.3 Chapter 
VIII], there exists a countable sequence J"i)igN of elements of § x M+ such that 

[0,T[xM"c \JC rt (ti,Xi). 

Note that the implication of (16) simply holds for (t, x) £ {T} x W 1 . Let us construct a sequence 

(C l ) ieNo as 

C°:={T}x]R", C 4 :=a 4 (*i,asi)\ (J C J . 

i<»-i 

By definition C 4 are pairwise disjoint and § c UiGN C*. Furthermore, P- a.s., (9,X e ' x ' u ) £ 
UieNo ana - ^ or a ^ * € No there exists ■u* i,x * g such that 

(22) (f>(t,x)-3e<j(t,x,ul" Xi ), V(i,i)eC'n§. 
To prove (17), let us fix u £ U t and 9 £ T[ t ,T\- Given e > we define 

(23) v e :=t m u + l ]etT] t ci (e,Xl' x ' u )ut^. 

Notice that by Fact 4.5, the set U t is closed under countable concatenation operations, and 
consequently v e £ Ut- In view of Proposition 4.6 and (22), it can be deduced that, P-a.e. on ft 
under v e in (23), 



E 



i£N 



- 1 {f(t,x)<0}K x i{t%) + t {f{t,x)>e} J{G,X t ( ; x ' u ,u t ^ x ')t Ct (e,X t g X ' u ) 



Xl' x ' u ) - 3e 
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By the definition of V and the tower property of conditional expectations, 



V{t,x) > J(t,x,v e ) = E E e(Xifj v j) I Ji 



> E 



1 {r{t,x)<e}i{X t ^ ) ) + %(*,*)><?}< 



6, XI 



3e E[l { ^( M ) >e }]. 



The arbitrariness of u € tit and e > implies that 



V(t,x) > sup E ^ {H t, x )<eAX^ u x) ) +4>(6,Xl' x ' u ) 

It suffices to find a sequence of continuous functions (3?j)jeN such that $i < V* on S and converges 
pointwise to V*. The existence of such a sequence is guaranteed by [Rcn99, Lemma 3.5 ]. Note 
that one may set (j) n := min m > n <& m for n G N to preserve the monotonicity of the convergent 
sequence (</>i)ieN [BT11]. Thus, by Fatou's lemma, 



V(t,x) > liminf sup E 



> sup E 

= sup E 
ueu t 



l{T(t,a)<9}^(^(tjx)) + l{r(t,a;)>e}0i(^ I ^0 

^{f(t^)<s}^( X f(t% ) + t {f(t,x)>e} Ktaw£4>i(6,X e 

t {f(t,x)<8}£(X 1 f '^ ) ) + ^{f(t,x)>6}V* (9, Xg x ' u ) 



t,x;u\ 



a 



Remark 4.8. The dynamic programming principles in (16) and (17) are introduced in a weaker 
sense than the standard DPP for stochastic optimal control problems [FS06] . To wit, note that 
one does not have to verify the measurability of the value function V defined in (9) to apply our 
DPP. 



4.3. Dynamic Programming Equation. Our objective in this subsection is to demonstrate 
how the DPP derived in §4.2 characterizes the value function V as a (discontinuous) viscos- 
ity solution to an appropriate HJB equation with some Dirichlct type (pointwise) boundary 
conditions. For the general theory of viscosity solutions we refer to [CIL92] and [FS06]. 

Definition 4.9 (Dynkin Operator). Given u G U, we denote by C u the Dynkin operator (also 
known as the infinitesimal generator) associated to the controlled diffusion (1) as 

£ u <f>{t,x) := d t $(t,x) + f(x,u).d x $(t,x) + ^Tr[acr T (x,u)d^(t,x)}, 

where <& is a real-valued function smooth on the interior ofS, with <9 t $ and d x <& denoting the 
partial derivatives with respect to t and x respectively, and 9^$ denoting the Hessian matrix 
with respect to x. We refer to [Kal97, Theorem 17.23] for more details on the above differential 
operator. 

Theorem 4.10 (Exit Time DPE). Consider the system (I), and suppose that Assumptions 2.1 
and 4-1 hold. Then: 

o the lower semicontinuous envelope of V introduced in (9) is a viscosity supersolution of 

-supC u V4t,x) > on [0,T[xO c , 

uev 

o the upper semicontinuous envelope of V is a viscosity subsolution of 

-sup£ u F*(i,x) < on [0,T[xO c , 

uev 
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both with Dirichlet type boundary conditions 

!V(t,x) = £(x) V(t,x) £ [0, T] x O (Lateral Boundary Condition), 
V(T,x) — t(x\ \/x G W l (Terminal Boundary Condition). 

Proof. In view of the definition of the stopping time f, following [RB98, Corollary 3.2, p. 65], 
one sees that 

f(t,x)=t, v(f,i)e([o,r]xO)u({r}xi") P-a.s. 

which immediately implies the boundary conditions. 

Supersolution: For the sake of contradiction, assume that there exists (to,Xo) G [0,T[xO° 
and a smooth function (f> : S — > R satisfying 

min (V* — (f>) (t, x) — (V* — 4>) (t , x ) = 

(i,x)6S 

such that for some <5 > 

- sup C u (j>(t Q ,x ) < -26 

Notice that, without loss of generality, one can assume that (to,Xo) is the strict minimizer of 
V* — 4> [FS06, Lemma II 6.1, p. 87]. Since (f> is smooth, the map (t, x) i— > C u <p(t, x) is continuous. 
Therefore, there exist «€U and r > such that B r (to, £o) C [0, T) x O and 

(24) -C u <f>(t, x) < -6 V(t, x) G B r (t Q , x ). 
Let us define the stopping time 9(t,x) G 7[*,t] 

(25) 6(t, x) = inf{s > t : (s, X^ u ) £ B r (t , x )}, 

where (t,x) G B r (t ,a;o)- Note that by continuity of solutions to (1), t < 8(t,x) < T P- a.s. for 
all (t,x) G B r (io,a;o)- Moreover, selecting r > sufficiently small so that 8(t,x) < to, we have 

(26) 6(t, x) < t A T = f (t, x) P-a.s. V(i, as) G B r (t , x ) 
Applying Ito's formula and using (24), we see that for all (t, x) G B r (i , xo), 

r 0(t,x) 



4>(t,x) = E 



< E 



H0(t,x),X i 



6{t,x). 



+ 



)ds 



t,x),X, 



6{t,x)J 



5(E[9(t,x)]-t)<-E d>{9(t,x),Xl^ x) ) 



Now it suffices to take a sequence (t n ,x n ,V(t n ,x n )) n ^ converging to (to, xq, V*(to, Xq)) to see 
that 

4>(t n , x n ) -s- (f>(t , x Q ) = V* (t , x ). 
Therefore, for sufficiently large n we have 

V(t n ,x n ) < ^(6(t n ,x n ),Xl^ n) )\ < ^\v*(6(t n ,x n ),xl^; u n) ) 
which, in accordance with (26), can be expressed as 

V(t n ,X n ) < E l{-f(t n ,a; tl )<fl(f n ,x„)}^(-^ f "t^'")) + ^{T(t n ,x n )>0{t n ,x„)}V*(9,X ^"^) . 

This contradicts the DPP in (17). 

Subsolution: The subsolution property is proved in a fashion similar to the supersolution 
part but with slightly more cares. For the sake of contradiction, assume that there exists 
(to,Xo) € [0, T[xO and a smooth function ^:§->K satisfying 

max (V* - </>) (t, x) = (V* - <j>) (to, x ) = 
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such that for some 6 > 



-sup£ u <j>(t ,x ) > 25. 



By continuity of the mapping (t,x,u) M- C u 4>(t,x) and compactness of the control set U, As- 
sumption 2.1. a, there exists r > such that for all u € U 

(27) -£ u c/>{t,x)>5, V{t,x)eB r {t ,x ), 

where B r (to,xo) C [0,T) x C . Note as in the preceding part, (to,xo) can be considered as the 
strict maximizer of V* — 4> that consequently implies that there exists 7 > such that 



(28) 



(V* - 4>){t,x) < -7, V(i,z) e dB r (t Ql x ). 



where dB r (to,xo) stands for the boundary of the ball B r (to,xo). Let 9{t,x) e 77{,t] be the 
stopping time defined in (25). Applying Ito's formula and using (27), one can observe that given 

u e U t , 



<p(t,x) = E 
> E 



4>(9(t,x),Xlf t 



x). 



+ 



e(t.x) 



_ C u s 



(t,x),xl™)\ +6(ne(t,x)}-t)>E[<i > (e(t,x),x t e f£ ) ) 

Now it suffices to take a sequence (i n , x n , V(t n , x n )) nS N converging to (to, xq, V* (io, %o)) to see 
that 

(f>(t n ,x n ) -> (f>(t ,x ) = V*(t ,x Q ). 
As argued in the supersolution part above, for sufficiently large n, for given u 6 U t , 

V(t n ,x n )>ti[(l>(d(t n ,x n ),X%££ ) )] >E[v*(6(t ni x n ) 1 Xll^ ) )~\ +7, 

where the last inequality is deduced from the fact that (6(t n ,x„),XgV Xn ^"C) € dB r (to,xo) to- 
gether with (28). Thus, in view of (26), we arrive at 



V(t n ,X n ) >E l{T(t,a)<6IC«n,Xn)}^(^T n ' X "' tt ) + ^{f(t,x)>e(t n ,X n )}^*(^^e{C?Xn)) 

This contradicts the DPP in (16) as 7 is chosen uniformly with respect to it € Ut- 



+ 7- 



□ 



5. A Connection Between the Reach- Avoid Problem and PDE Characterization 

In this section we draw a connection between the reach-avoid problem of §2 and the stochastic 
optimal control problems stated in §3. To this end, note that on the one hand, an assumption 
on the sets A and B in the reach-avoid problem (Definition 2.4) within the time interval [0, T] is 
that they are closed. On the other hand, our solution to the stochastic optimal control problem 
(defined in §3 and solved in §4) relies on lower semicontinuity of the payoff function £ in (9), see 
Assumption 4. I.e. 

To achieve a reconciliation between the two sets of hypotheses, given sets A and B satisfying 
Assumption 3.1, we construct a smaller measurable set A e c A° such that A € := {x E A | 
dist(x, A c ) > e} and A e satisfies Assumption 4.1.b. Note that this is always possible if O '■= 
AuB satisfies Assumption 4.1.b. — indeed, simply take e < h/2 to see this, where h is as defined 
in Assumption 4.1.b. Figure 3 depicts this case. To be precise, we define 

(29) V e (t,x) := sup E[4(X*f u )] , r e := T AeUB A T, 

ueu t 



'dist(x, A) := ixiiy^A \\x — y\\, where || ■ || stands for the Euclidean norm. 
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R 




Figure 3. Construction of the sets A £ from A as described in §5. 

where the function £ e : W l — > R is defined as 

i e ( X ) := (l- dlSt( ^ e) )vO. 

The following Theorem asserts that the above technique affords an e-conservative but precise 
way of characterizing the solution to the reach-avoid problem defined in Definition 2.4 in the 
framework of §4. 

Theorem 5.1. Consider the system (I), and suppose that Assumptions 2.1, 3.1, J^.l.a. and 
4-l.b. hold. Then, for all (t,x) <G [t,T[xM. n and e\ > 62 > 0, we have V e2 {t,x) > V ei (t,x), and 
V(t,x) = lim e ^o V e (t, x) where the functions V and V e are defined as (4a) and (29) respectively. 



Proof. By definition, the family of the sets (^4 e ) c> o is nested and increasing as e J, 0. Therefore, 
in view of (3a), r £ is nonincreasing as e J. pathwise on CI. Moreover it is obvious to see that the 
family of functions £ e is increasing with respect to e. Hence, given an initial condition (t, x) G S, 
an admissible control m € W(, and £i > £2 > 0, pathwise on CI we have 

4 2 (X^) < 1 T t2 = T B A T < T Ac2 < T Aci 

=^t Ci =t b AT = r £2 =► 4 2 (X^ u ) > i ei (X*™) , 

which immediately leads to V t2 (t,x) > V ei (t, x). Now let (ei)ieN be a decreasing sequence 
of positive numbers that converges to zero, and for the simplicity of notation let A n := A tn , 
t„ := T en , and t n :— £ tn . According to the definitions (4a) and (29), we have 

V(t,x) - lim V e (t,x)= sup F,\t A (Xp x;u )] - Um sup ~E\£ n (Xp x ' u )] 
(30a) = sup W,[l A (Xp x;u )] - sup sup E[£ n (Xp x ' u )] 

< sup (E[l A (Xp*-> u )} -supE[C(X*f u )]) 

< sup inf E[l A (Xp x ' u ) -l An (X^ u )] 

(30b) = sup inf P» a ({r An >r B AT}n {t a < T} n {r A < t b } 

(30c) = sup P£ x ( f| {t A „ > t b A T} n {t a < T} n {r A < r B } 

(30d) < sup Pf x ({t A o >r B AT}n {t a < T} n {t a < t b } 

uelit v 
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(30e) < sup ({r A o > t a } U {t a = T}) = 

ueu t 

Note that the equality in (30a) is due to the fact that the sequence of the value functions 
(V £n ) ngN is increasing pointwise. One can infer the equality (30b) when t A (X f ' x ' u } — 1 and 
l An (X^ u ) = as t A (X t / x:u ) > l An (X^ u ) pathwise on O. Moreover, since the sequence 
of the stopping times (t„)„ 6 n is decreasing P-a.s., the family of sets ({t Aii > r A}) ngN is also 
decreasing; consequently, the equality (30c) follows. In order to show (30d), it is not hard to 
inspect that 

uj G P| {r An > t b A T} Vn G N, T An {uS) > t b (lj) A T 

=>■ Vn G N, Vs < r B (w) A T, A ' ,: " (w) ^ A n 
=> Vs < tbM A T, X*— (w) £ |J A„ = A° 

tiEN 

=>■ w G {t A o > t b A T}. 

Based on non-degeneracy and the interior cone condition in Assumptions 4. I.e. and 4.1.b. 
respectively, by virtue of [RB98, Corollary 3.2, p. 65], we see that the set {t A o > t a } is negligible. 
Moreover, the interior cone condition implies that the Lebesgue measure of dA, boundary of A, is 
zero. In light of non-degeneracy and Girsanov Theorem [KS91, Theorem 5.1, p. 191], X^' x ' u has a 
probability density d(r, y) for r G]i, T); see [FS06, Section IV. 4] and references therein. Hence, the 
afore-mentioned property of dA results in P^ x {r A = T} < pIa:^" G dA) = J gA d{T, y)dy = 0, 
and the assertion of the second equality of (30e) follows. It is straightforward to see V > V 6n 
pointwise on S for all n G N. The assertion now follows at once. □ 

Remark 5.2. Observe that for the problem of reachability at the time T , (as defined in Definition 
3.4,) the above procedure is unnecessary if the set A is open; see the required conditions for 
Proposition 3.5. 

The following Theorem addresses continuity of the value function V e in (29). It not only sim- 
plifies the PDE characterization developed in §4.3 from discontinuous to continuous regime, but 
also provides a theoretical justfication for existing tools to numerically solve the corresponding 
PDE. 

Theorem 5.3. Consider the system in (1), and suppose that Assumptions 2.1 and 4-1 hold. 
Then, for any e > the value function V e : S — > [0, 1] defined as in (29) is continuous. Further- 
more, if A e U B is bounded then V e is the unique viscosity solution of 



(31) 



'-sup ueV £ u V e (t,x) =0 in [0,T[x(A e UB) c 

V e {t,x)=£ e (x) on ([0,T]xA £ UB)U(m 



Proof. The PDE characterization of V e in (31) is the straightforward consequence of its continuity 
and Theorem 4.10. The uniqueness follows from the weak comparison principle, [FS06, Theorem 
8.1 Chap. VII, p. 274], that in fact requires (A e U B) c to be bounded. It then suffices to prove 

continuity of the mapping (t,x) h->- E £ e (X*'^"^) uniformly with respect to u G U. To this 

end, one may consider the version of X:' x,u which is almost surely continuous in (i, x) uniformly 
respect to the policy u, since the constant C2 in (12) does not depend on u. That is, u may 
only affect a negligible subset of f2; we refer to [Pro05, Theorem 72 Chap. IV, p. 218] for further 
details on this issue. Hence, all the relations in the proof of Proposition 4.3, in particular (13), 
hold if we permit the control policy u to depend on n in an arbitrary way. This last fact implies 
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that for all (t, x) € §, (t„,x n ) — > (t,x), and (it n )neN C U, we have limn-j-oo T e (t n , x n ) = r e (t,x) 
P-a.s., where r e is as defined in (29). Moreover, according to [Kry09, Corollary 2.5.10, p. 85] 



E 



<C 3 (q,T,K,\\x\\ 



Vr,s€[t,T] Vg>l, 



following the arguments in the proof of Proposition 4.3 in conjunction with above inequality, 
one can also deduce that the mapping s <— ¥ X l g ' x ' ,u is P-a.s. continuous uniformly with respect to 
u. Now the assertion readily follows from Lipschitz continuity of £ e , and all continuity notions 
around the process X l : x ' u irrespective of control policy u. That is, setting r e := T e (t, x) and 
7f := T e (t n ,x n ), for any (u„)„ eN CU we have 



lim sup E 



|£ e (X*f u 

< -E 
e 



7-t n ,x n ;Un 



-4(4; 

limsup u ™ -X l T i 

n. — ±m v 



< E 



limsup|4(X*f ;u ») -^(Jf*j'*» iu -)| 



0. 



where the first inequality follows from Fatou's lemma and uniform boundedness of £ e . In the 
second line, the first term vanishes due to the almost sure continuity of stopping times t c at 
(t, x) and the mapping s H> x t s x '' Url , and the second term due to almost sure continuity of the 
mapping (t,x) H> X.' x ' Un , all uniformly with respect to u n . □ 



The following Remark summarizes the preceding results and pave the analytical ground on 
so that the Reach- Avoid problem is amenable to numerical solutions by means of off-the-shelf 
PDE solvers. 

Remark 5.4. Theorem 5.1 implies that the conservative approximation V e can be arbitrarily 
precise, i.e., V(t,x) = \im e ^ V e (t,x). Theorem 5.3 implies that V e is continuous, i.e., the PDE 
characterization in Theorem 4-10 can be simplified to the continuous version. Continuous viscos- 
ity solution can be numerically solved by invoking existing toolboxes, e.g. [Mit05]. The precision 
of numerical solutions can also be arbitrarily accurate at the cost of computational time and 
storage. In other words, let V/ be the numerical solution of V t obtained through a numerical 
routine, and let 5 be the descretizaion parameter ( grid size ) as required by [Mit05] . Then, since 
the continuous PDE characterization meets the hypothesis required for the toolbox [Mit05] , we 
have V e = lim^o V*. Finally, V(t, x) = lim e j.o hm^ V e s (t, x) . 



6. Numerical Example: Zermelo navigation problem 



To illustrate the theoretical results of the preceding sections, we apply the proposed reach- 
avoid formulation to the Zermelo navigation problem with constraints and stochastic uncertain- 
ties. In control theory, the Zermelo navigation problem consists of a swimmer who aims to reach 
an island (Target) in the middle of a river while avoiding the waterfall, with the river current 
leading towards the waterfall. The situation is depicted in Figure 4. We say that the swimmer 
"succeeds" if he reaches the target before going over the waterfall, the latter forming a part of 
his Avoid set. 



6.1. Mathematical modeling. The dynamics of the river current are nonlinear; we let f(x,y) 
denote the river current at position (x,y) [CQSP97]. We assume that the current flows with 
constant direction towards the waterfall, with the magnitude of / decreasing in distance from the 

middle of the river: f(x, y) := ( v ~ J . This model may not describe the behavior of a realistic 

river current, so we consider some uncertainties in the river current modeled by a diffusion term 

as o~(x,y) := ( "ft a j. We assume that the swimmer moves with constant velocity V$, and 
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Figure 4. Zermelo navigation problem : a swimmer in the river 



we assume that he can change his direction a instantaneously. The complete dynamics of the 
swimmer in the river is given by 



(32) 



dx s 













1 - ay 2 + V s cos(a) ^ a x 
Vs sin(a) ' 

where W s is a two-dimensional Brownian motion, and a € [tt, tt] is the direction of the swimmer 
with respect to the x axis and plays the role of the controller for the swimmer. 



6.2. Reach- Avoid formulation. Obviously, the probability of the swimmer's "success" start- 
ing from some initial position in the navigation region depends on starting point (x,y). As 
shown in §3, this probability can be characterized as the level set of a value function, and by 
Theorem 4.10 this value function is the discontinuous viscosity solution of a certain differential 
equation on the navigation region with particular lateral and terminal boundary conditions. The 
differential operator C in Theorem 4.10 can be analytically calculated in this case as follows: 



sup £"$(£, x, y) = sup 



(d t $(t,x,y) + (1 -ay 2 +V s cos(a))d x $(t,x,y) 



V s sin(a)<9 y $(i, x, y) + -<J 2 x d 2 x <5>{t, x, y) 



a 2 y d 2 ^(t,x,y) 



It can be shown that the differential operator can be simplified to 



sup£"$(i,x,y) 

u60 



1 



dMt, x, y) + (l- ay 2 )d x <i>(t, x, y) + -o^$(t, x, y) + 
where V$(t, x, y) := [d x $(t, x, y) d y $(t, x, y)) . 



1 

2° y " 



a 2 d 2 Mt,x,y) + V s \\W<f(t,x, ] 



6.3. Simulation results. For the following numerical simulations we fix the diffusion coeffi- 
cients a x = 0.5 and a y = 0.2. We investigate three different scenarios: First, we assume that 
the river current is uniform, i.e., a = m~ 1 s _1 in (32). Moreover, we consider the case that the 
swimmer velocity is less than the current flow, e.g., Vs — 0.6 ms _1 . Based on the above calcu- 
lations, Figure 5(a) depicts the value function which is the numerical solution of the differential 
operator equation in Theorem 4.10 with the corresponding terminal and lateral conditions. As 
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expected, since the swimmer's speed is less than the river current, if he starts from the beyond 
the target he has less chance of reach the island. This scenario is also captured by the value 
function shown in Figure 5(a). 



t - 60 t - 60 




-6 



(a) The first scenario: the swimmer's speed is slower (b) The second scenario: the swimmer's speed is 

than the river current, the current being assumed uni- slower than the maximum river current. 

form. 



I : 50 




(c) The third scenario: the swimmer can swim faster 
than the maximum river current. 

Figure 5. The value functions for the different scenarios 



Second, we assume that the river current is non-uniform and decreases with respect to the 
distance from the middle of the river. This means that the swimmer, even in the case that his 
speed is less than the current, has a non-zero probability of success if he initially swims to the 
sides of the river partially against its direction, followed by swimming in the direction of the 
current to reaches the target. This scenario is depicted in Figure 5(b), where a non- uniform 
river current a = 0.04 m _1 s _1 in (32) is considered. 

Third, we consider the case that the swimmer can swim faster than river current. In this 
case we expect the swimmer to succeed with some probability even if he starts from beyond the 
target. This scenario is captured in Figure 5(c), where the reachable set (of course in probabilistic 
fashion) covers the entire navigation region of the river except the region near the waterfall. 

In the following we show the level sets of the afore-mentioned value functions for p = 0.9. To 
wit, as defined in §3 (and in particular in Proposition 3.2), these level sets, roughly speaking, 
correspond to the reachable sets with probability p — 90% in certain time horizons while the 
swimmer is avoiding the waterfall. By definition, as shown by the following figures, these sets 
are nested with respect to the time horizon. 
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(c) The third scenario: the swimmer 
can swim faster than the maximum 
river current. 



Figure 6. The level sets of the value functions for the different scenarios 

All simulations were obtained using the Level Set Method Toolbox [Mit05] (version 1.1), with 
a grid 101 x 101 in the region of simulation. 

7. Concluding Remarks and Future Direction 

In this article we presented a new method to address a class of stochastic reachability problems 
with state constraints. The proposed framework provides a set characterization of the stochastic 
reach-avoid set based on discontinuous viscosity solutions of a second order PDE. In contrast 
to earlier approaches, this methodology is not restricted to almost-sure notions and one can 
compute the desired set with any Zermelo navigation problem. 
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